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WEB CONTENT TRANSCODING SYSTEM AND METHOD FOR SMALL 

DISPLAY DEVICE 

Technical Field 

The prfe^fit inVentibii, relates to a web content c<m 1 ? ore 
particularly, 'to'-W \wA;^datak m ti^<^itig(c<>nvert^g) ''^^;^^mOboA^ for a small 
display dev^ 

of * |eiex^^ converteid "^e.e^cjt^ely displayed even 
on a small., display: ' 

Background. Art 

. Receiitly, as^; d -small device;, 

technolbgies are accelerated, a graft' 6f tfcese technologies on internet forms a wireless 
Internet environment and begins to satisfy peopled desires for intending to use a web 

"ar^ 

adaptively to a display size <of *fce desktop computej and is browsed through, the small 
display device; d[ 66 wehtional ^^;^\da^^^m : ^ tfi^. intent infoiiaatibh is not 
well displayed oil, the s^^l^splay the jperfoniiiice i of the small ; 

display device! 

^ content -convertrng methods. haye; beeai 

proposed However, since a simple converting irito a ^ text sumiiiary i s a m ainstream for 
initial methods for supporting a cellular phone series device or a low performance PDA 
(Personal Digital Assistant), etc., the user's requiring much information cannot be well 
displayed. This is caused by a limit to a device perform 

Internet markup language with a simple expression capability such as a text or HDML 
(Handheld Device Markup Language), WML (Wireless Markup Language), etc. 

The conventional converting has a drawback in that since only a p ortiori of the 
. existing web information is extracted and converted, it is difficult to exactly convert a 
current complicate-structured web page having a lot of images arid information 
simultaneously expressed/ - 

After that, as devices of the high performance PDA, hand-held personal computer, 
etc. have appeared, converting methods therefor have been continuously studied. As a 
result, a converting tool that operates m a server such as WebSphere Converting Publisher, 
Sypglass Prism, etc. manufactured by IBM has appeared. The converting tool uses a 
method in. which a web server manager converts through its manual work so as to more- 
exactly convert a web content. The converting tool has a disadvantage in that non- 



WO 2004/040467 



PCT/KR2003/002322 



automatic converting is performed, and a conVerting-served document is limited in its 

range comparing with an enormous amount 6f the document on the wire Internet 

Further, as a converting method functioiring in the device, there are Smart View, ; 

Pad-H-, etc. for providing a zooii-iii/zoom-out function^ T^e Smart View, Pad^. etc. have 
5 an advantage in that a device pertonnance can be more exactly understood and a user's 

requirement can ./be easily, reflected, .but have an ! inconvenience .in that after general [ 

information on a total page is checked with the image; a.zqomed-ia content is once piore . 
' ' ; ■ again checiceid for a sut stantiai understanding' of the cdntent by .using a; 2;oom-in ihterface: 

. at each portion of the'page// . . ["'.': ■'- ." ''.\' r -. : "yf\.' : -> \ " ' * : . 

l6 Fu^er; a^ ihe'c functioning '.at.. a prcjky] se?rv^V there axe '.Top .] 

0vin Wingihan that provides a inverting proxy T for a browser of a patepilot device, and 

Digester- 1^%u^c^-: : aif ? of the • handheld ' ,or cefluiat;;sibnes . devices,- -.etc, \Tb6 - Digester . 

peiforins &e &yerting dependmg on^various heuristic converting methods obtained. 

'*throu^'^C : c^nv6i^g? directly performed by ; a person, and suitable 'applicatiori : rulef « 
15 therefor. For exact 'converting, a plurality of complicated algorithms is used, and 

information oii\1he/coiiy«rting result is expressed m>ummary ? j2^^ 

etc. However, there is a drawback in that an interfec;e L is inconvenient foir an hiformatioh 
search <iiie to a limited mfonnatioiv expression method, a complicated category structure, 
and a use of a plurality ^ of hyperlink indexej?. . . 

» 0 : . Other Conventional arts are well known as disclosed in "Real-time internet content 

converting method and system" in Korean Patent Laid-Open No. 2002-3 1 69 1 (Application 
No. 10^2000-0062342), and in "Content formulation system and method'' in Korean Patent 
Laid-Open No. 2002-15223 (Application No. 10-2000^0048415). Herein, the "Real-time 
internet content converting method and system" uses a predetermined rule such that a 

:5 portion of a document content is extracted, page-divided or converted into other markup 
languages. The converting into a document summary is merely performed and a document 
analysis method and a re-expression method are not disclosed in detail. Further, the 
"Content processing system and method thereof merely refers to a general construction of 
a converting system for the small device user of a wire web content. 
0 Accordingly, a conventional web document converting technique does not reflect a 

rapid improvement of the device performance, and means converting in a way of extraction 
of o nly a s pecific p ortion ora c ontent s ummary, t he c omplicated c ategory s tructure f or 
expressing this, and the page-division and link-connection. A detailed proposal cannot be 
found for a clearly analyzing, converting and expressing method. That is, in most of earlier 
5 studies, the simple text summarizing converting is performed for the low performance 
cellular phone series device. Recently, high performance hand-held devices have been 



WO 2004/040467 



PCTVKR2003/002322 



appeared, but the converting for content reduction such as the content summary, toe image 
deletion, etc/ is still mainstream. Or, a method, for the page-division and the page-link 
using link is provided, but in case a link depth is deepen even without a substantial content 
summary, there is an inconvenience in that a total, content is difficult to-be understood and ■ 
5 a previous page is again returned. 

Disclosure of the Invention 

' Accordingly, the present iirventiori is directed tb.system and method for parsing' 

multi^ocume^ one ? r more af ^ 

lb problems due to limitations^ • v. 

Accordingly, /the;present .invention is directed to a W 
. . and method for a s^ati. display due 
: to limitations and disadvantages of the ^ -. 

-An object of the present inv^ition is; to provide a web cxmtent eohvertirig system 
L5. and method for a small display device in which ^ current web document including a lot of 
complicated information can be converted to reflect a content of an original document to 
. /.the maximum and simultaneously to have a cpriveirierit interfabe, in consideration of a 
• . performance improvement of a user's device. . 

Additional advantages, objects, and features of the invention will be set forth in 
20 part in the description which follows and in part, wiU become apparent to those having 
ordinal sltill in the art upon examination of the following or may be learned, from practice 
of the invention. The objectives and other advantages of the invention may be realized and 
attained by the structure particularly pointed out in the written description and claims 
hereof as well as the appended drawings. 
25 To achieve these objects and other advantages and in.accordance with the purpose 

of the invention, as embodied and broadly described herein, there is provided a web 
content converting system for converting a large display screen web document into a small 
display screen web document, the system including: a preprocessor for standardizing a . 
non-standard web document having an erroneous tag to output the standardized web 
3 0 document in a data format suitable for analysis; a client profile analyzer for extracting and. 
managing client information; a structure analyzer for receiving the web document ■• 
standardized in die preprocessor to set the web document to a content unit piece 
(component) according to a document analysis algorithm; an image converter for 
extracting information . on an image encoding/decoding procedure and an image size 
3 5 included in theweb document; a component block extractor for grouping the set content 
unit piece (component) to similar groups within a range not exceeding a maximal width by 
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using an attribution value of the content unit piece (component) and client performance 
information; a component block categorize* for categorizing each of component blocks 
generated by the component block extractor into index and body content portions in 
accordance with a content characteristic; an index generator for extracting information on 

5 image or text index from the component block categorized into the index: portion; and 
generating a script file and ah additional tag collection, for pressing the extracted 
information; a voice markup generator for converting a text-e^ntered body . content block 
into a voice markup language to pearforhi a.voice suppordng-function; and a HyperText 
Markup Langua for rearranging and re^fetmctiii^ tie generated 

.0 content object elements according to. a document pattern to generate the small display 
screen "web document. 

In another . aspect, of .the present invention, tha-e ' is! provided a web content 
^converting method for converting a: lafge display screen Web document into, a small display 
screen web document, the.-me^ a prepfossesing siep for standardizing a non- 

[5 standard Web document including an erroneous tag to output the standardized web 
document in a data format suitable i for analysis; a web document analyzing step for 
receiving the standardized web document and analyzing a tag according to a document 
analysis aigoritiiiri to? set the web document; to a content unit piece (component); a 
component block setting step for grouping the set content unit piece (component) to similar 

2 0 groups within a range not exceeding a maximal width by using an attribution, value of the 

content unit piece (component) and client performance information; a component block 
categorizing step for categorizing each of component blocks generated by the component 
block extractor into index and bpdy content portions in accordance with a content 
characteristic; an index generating step for extracting information on image or text index 

25 from the component block categorized into the index portion, and generating a script file 
and an additional tag collection for expressing the extracted information; a voice markup 
generating step for converting a text-centered body content block into a voice markup 
language to perform a voice supporting function; and a HyperText Markup Language 
(HTML) generating step for rearranging and reconstructing the generated content object 

30 elements according to a document pattern to generate the small display screen web 
document 

.According to the above construction and method, the present invention provides a 
convenient interface in which a ' characteristic of the web document is reflected for 
simultaneously expressing a lot ' of current complicated information through the 

3 5 rearrangement by the content unit block, not the conventional mformation extracting and 

summarizing method, and a visual and auditory expression is simultaneously supported 
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without a left and right scroll through index generation: and categorization of the content 
unit block, and the converting into a format of a voice supporting document, not a 
conventional method of an index-structure having more depms or page-division; 

Accordingly, in the present invention, a total web document can be . browsed 
5 . without the left and right scroll through the rearrangement bf the content unit block, the 
extraction of the index block and various index g^ferating fractions considering a screen 
size of the display device, a/more convenient interface can be provided by converting iiito 
.. tlie voice supporting markup language in case of the text-centered content body block, a 
. content of the original web document can be reflected to the maximum by constructing a 
10 total strudure suitably ibr a small scr^n size. 

It is to be understood that both the foregoing gerieral description and the foDowing 
detailed description of the present invention are exemplary, and explanatory and are 
mtended to prbvide finmer e 

15 Biief Description of the Drawings 

The accompanying dravvings, which are included" to provide a .further 
understanding of the invention and are incorporated in and constitute a part of this 
application, illustrate embodiments) of the invention and together with the description 
serve to €fxplam .&e principle of the invention. In the drawings: 
20 FIG. V is an exemplary view jUustrating a web docuinent for expressing content 

blocks different from one another through visual categorizing and grouping; 

FIG. 2 is a conceptive' view illustrating a module construction of a web content 
converting system for a small display device according to a preferred embodiment of the 
present invention; 

25 FIG. 3 is a view illustrating an expression class relation of a table tag; 

FIG. 4 is a flow chart illustrating an operational procedure of a web content 
converting system for a small display device according to a preferred embodiment of the 
present invention; 

FIG. 5 is a flow chart of illustrating a detailed algorithm of a web document 
30 analyzing step of FIG. 4; 

FIG. 6 is a flow chart of illustrating a detailed algorithm of a component block 
setting step of FIG. 4; 

FIGs. 7 A and 7B are exemplary views for describing a web document analyzing 
step and a component block extracting step according to a preferred embodiment of the 
3 5 present invention; 
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FIG. 8 is .a flow chart illustrating a detailed algorithm of a component block 
categorizing step of FIG. 4; 

FKjs. 9 A and 9 B are exemplary y 
content according to a preferred embodiment of the present invention; 

5 . ■ . ' . ' ■ \ r. • 

Best Mode for Carrying Out the Invention 

Reference will now be prepared in detail to the preferred embodiments of the : 
present invention, examples of which : are illustrated in the accompanying vdrawings. 
Wherever possible^.the same reference numbers will be used throughout the drawings to 
10 refer to the same or like parti. • ' , .*"■ . i. , 

FIG. 1 is ah exemplary view iUustratihg a web document for'expressing .content 
blocks different ^om one anotW 

Referring to FIG. 1, the web document is designed for a visual c^e^orization of a 
content having a rheanirigful difference using a:, layout and a structural tag such that . a - 
15 nianufacturer of a HTML (Hy^erText Markup Language) clearly transmits the content; 
Most of the visual categorizations use .the tag for a structural expression such as "TABLJE", 
etc;, and accordingly, the tags can be analyzed to Understand; a tbtal structure: At this time, 
sorhe injudicious use of a tajg collection arid an unclear categorizatipri^h a structure and a 
meaning of the HTML itself are considered to utilize an attribution value of the tag, a data 
20 characteristic of the tag, and position information for expressing data information of the tag 
object, etc. as well as the structural tag, for analysis. 

Through the structure analysis of the web document, a minimal content unit piece 
101 (it is called "component") constructing a visual categorization layout as shown in FIG. 
1 is set, and the content unit piece 101 is grouped considering a performance, particularly a 
2 5 display performance of the user device, and is expressed as a content unit block (it is called 
"component block") 102. 

The content unit blocks 102 are categorized into an "index" portion and a "content 
body 3 * portion according to a characteristic of the content, and are respectively re- 
expressed in a suitable format: The index portion is re-expressed in a format of an upper 
30 selected box as shown in 121 of FIG. 9A, which will be described later, and the body 
portion is merely rearranged without any converting into a main content portion as shown 
in 122 of FIG. 9 A or converted into a voice supportable document format as shown in 123 r 
of FIG. 9B for expression. 

FIG. 2 is a conceptive view illustrating a module construction of a web content 
35 converting system for a small display device according to a preferred embodiment of the 
present invention, and FIG: 4 is a flow chart illustrating an operational procedure Of the 
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web content converting system for the small display device according to a preferred 
embodiment of the present invention. 

As shown in FIG. 2, the content converting .system according to the present 
invention includes -detailed modules 20J to 200 for performing operations : 'of- a 
5 preprocessing step (S 1)> a web document ansQyzing step (S2), a web document converting 
step (S3) and a web'iiocument generating step (S4). 

The preprocessing, step (SI) is.performed in a preprocessor 201 and a client profile 
analyzer 202. The preprocessor 201 stahdiardizes a non-standard web document including 
an erroneouis tag to output 'the standardizedL web document in a data format suitable for' 
L 0 analysis. . The .client 'profile saxalyzissr 202 performs a reception -function of client 
information. The client, mfeimation can be included in a 
. transmission or,can utilize a ;g)ecific communication protocol for tra^missipn. Besides, an 
input/output management with an external module is performed iii the prepr;ocessing step ., 

: (SI) 

L 5 ha the web document analyzing step (S2% a layout-based structure analyzer 203 

receives the web document standardized in the . preprocessing step (SI), and the web 
document is set to the content unit piece (component) through a web document analyzing 
algorithm. An image converter 204 extracts iivfonnafibn on 'an image encoding/decoding 
procedure, and an image size of the web document . 

2 0 hi the web document converting Step (S3), a component block extractor 205 

perforins grouping of the defined content unit piece (component) to similar pieces within a 
range not exceeding a maximal width .(MAX_W1DTH) of a single screen by using 
information on a client performance arid the attribution value of the content unit piece 
(component). A component block categorizer 206 categorizes each component block into 
25 the "index" and "body content" portions depending on the characteristic of the content 

The web document generating step (S4) performs a procedure of generating 
necessary content objects. An index generator 207 extracts image or text index information 
from the index-categorized component block, and generates a script file and an additional 
tag collection for expressing the extracted information. An auditory markup generator 208 

3 0 performs a converting procedure of a text-centered body content block into a markup 

language such as voiceXML, etc. so as to perform an auditory supporting function. At this 
time, a browser should provide a function of rendering the web document of auditory 
information to sound Lastly, a customized HTML generator 209 suitably rearranges and 
re-constructs content object elements generated in an earlier step according to a document 
3 5 pattern to generate a customized web document 
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FIG. 4 is a flow chart for describing a. total, operational procedure of FIG. 2. 
Referring to the drawings, an original HTML file is inputted to standardize the HTML 
document, and then a data structure having a HTML DOM tree format is outputted (401 to 
403). These s teps are p erfdrmed in the prq>rocessor 2 01 m odule o f F IG. 2 : In the w eb 
5 document analyzing (HTML tag analjcdng) : step,404 > tree data is inputted to analyze the • 

tag, and this procedufe is performed in thev^iicture analyzer 203 and the. image converter 
... .204: of FIG. 2. A detailed algorithm of the web* document analyzing step 404 will be 
described below with reference ^to the flow chart of HG; 5. 

After the tag analyzing step, a component block setting step 405 is performed hi ' ' 
10 the component block extractor 205 of FIG. 2, and a next component block categorizing ; 
. step 406 is performed ixi the component block categorizer 206 of FIG. 2. Each of the 
algorithms of the component block setting step 405 and the component block categorizing 
• step 406 is described with reference to tHe'.ilow charts of FKjs. '6 20x6 8. . J. • [ 
First, wth reference to' FIG. 5, a detailed algorithm of the web document m 
15 step 404 will be described as follows. 

The analysis algorithm of the present invention will be described for the case in : 
which the tags such as <TABLE>, <TR>, <TD>, <IMG>, etc. are mainly used and a . 
specific tag <ED> is defined as the component to be used as a minimal unit of a content : 
unit analysis. ."'.',"*.. 
2 0 First, a HTML document tre.e data structure is inputted, and the maximal screen 

width received through the user device is defined as the maximal width /TWAX^WIDTH" . . 
(501, 502). In the analyzing procedure, information as in Table 1 is additionally stored in a 
tag node <TD> and is later used for extraction of the component block. 



Table 1 


Variable 


Content 


width _ 


Width value being re-calculated in pixel unit 


CompjLium 


Value for expressing ID of component in case of setting to component 
General component: (sequence number,0,0) 
Nested component: ■ 

(0, first number of Comp num of first child, first number of Compjnum 
of last child)' / 


Col_num 


Number representing at which column to position in layout of total table 
structure . 


Row_num 


Number representing at which row to position in layout of total table 
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structure 


Table_depth 


Representing number of ancestor tag node <Table> of <^liX> 9 that is. 
depth of nested_table 



After an. initialization for a global variable is endeid in a step of 502, all of the tag 
nodes .ate visited b a preorder sequence while the following procedure are repetitively 
performed (503). . > : ;,. '■• .' "■ .. : . 

'"s" V In .case ; of thfe yisited node being :<TABLE> tag (504), the table depth 

(Table^depth) is checked (505), and.in case of the critical value (e.g., 3) being exceeded, 
the <Table> tag aid its all subordinate child nodes are regarded as a general content to 
perfoimionly a width setting step (506) without any further analysis. In case the table deptii 
: (Table_depth) does not ^ exceed the critical, value (e.g.* 3), a value oif the table depth 

10 . (TaMe_dqpdiy is ^ increased % one (507)/ ■ ; 

; \ : In case of the .visited node being* <TR> tag (508), a row number (Rpw^num) is 
increased (509)^ However, < in case of the first row of the nested table, the row number is 
not increased. Further, in case of -the <TR> tag o£ the root table, a column number 
(Col_num) is initialized by zero. 

15 In caseofthevisitednodebeing <TD> tag (510), it is determined whether the 

cohtent is included (51 1) to increase the column number (Col_rium) (512). However, a first 
<TD> of the nested table <TR> is not increased, the width setting step 522 is performed in 
case the <TD> does not include the content for use in a layout expression, and the 
component is set and structural information is added in case the content is included. 

20 That is, the component is defined as <TD> tag block having the c ontent If the 

<TABUB> tag is included as a child among the component (513), set is made to the nested 
component to mark the. value of the component number (Comp^num) as shown in Table 1 
(514), and in case tags other than the <TABLE> are included as the content, set is made to 
a general component to define a variable of the component number (Comp_num) as an 

2 5 increased sequence number (51 5)> 

Referring to the expression class relation view of the <TABLE> tag of FIG. 3, a 
tag kind that can be included in the <TD> tag can be checked. Referring to the drawings, 
the table is categorized into TR and CAPTION, and the TR is categorized into TH and TD. 

In case the visited node is <MG> (516), the width is checked and then changed 

3 o (517. 518). If the width is changed, if is checked whether the image map .is set. If the image 

map is set, a COORDS attribution value of an image map code <AREA> representing a 
coordinate valixe is modified using a formula of 520. In the width* setting procedure of the 
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step 518, a %-set value is exchanged into a pixel, the width is substituted with the maximal 
width (MAX_WIDTH) in case the width exceeds the maximal width (MAX_WIDTH) ? and 
an analogy is made using the <TB> width, a sum of the <TD> width and a maximal 
<IMG> widtri, etc. if the width attribution value is not set 
5 FIGs. 7 A and 7B are exemplary view for describing the web document analyzing 

step and the component, block extracting step according to a preferred embodiment of the* 
. present invention: . ■' . / : ' :a 

Through an example of FIGs. 7A and 7B 3 the striictural information obtained from, 
the algorithm of FIG. 5 i$ checked. 
10 i In FIG. 7 A illustrating the visual expression of the structural tag, the <TABLE>, 

. <TR>, <TD> block are expressed, and the component is set for the <TD> tag block having 
the content Additional information is shown in the following Table 2. In FIG. 7B 
expressing the tag collection as in FIG. 7 A in a tree model of the structural tag, the class 
.relation between the tags can be easily understood. ") 
15 Table2 '. " . ■ 



(A) 


Comp_num 


Row_num 


Col_num 


Tablejlepth 


Width . 


® 


(1,0,0) 


.1 


1 


' 1 


200 




(2,0,0) 


.1 


.2 


1 


400 


® . . 


(3,0,0) 


} 


• 3/ ' 


1 


200 . 


® 


(0,4,7) 


2-5 


1-1 


1 


150 




(4,0,0) 


2 


. ... 1 


. 2 


150 


© 


(5,0,0) 


3 . 


1 


2 


150 




(6,0,0) 


' 4 


1 


2 


150 


© 


(7,0,0) 


5 


1 


.2 


150 


©' • 


(0,8,15) 


2-5 


2-4 ' 


1. 


650-> 
MAXJWTDTH 


<E> 


(8,0,0) 


2 


2 


2 


650-> 
MAX_WIDTH 


®" 


(0,9,14) . 


3-5 


2-3 


2 


400 




(9,0,0) 


3 


2 


• 3 


200 


© 


(10;0,0) 


3 


3 


• . 3 


200 
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@ 


(ii s o,o) ; 


. 4 


,2 ■ 


■ ■ .' 3 '. "■ " 


200, 




(12,0,0) 


4 


3 


3 ' 


200 




(13,0,0) 


• : . 5 


• 2 ' 




• 200. 




(14,0,0) . 


.5 . 


•• • 3 ' 


• -3- • V 


200. 


' ©." 


. (15,0,0) 


3 : 


• a:,./-- 




250 


■16:. 


<i6,o,o) ;; 


.6 


• i ». . 


. , i 


800V*MAX_WIDTH 



In tlie above Table 2, (A) is the -first number of . the component nmnbier . 
. (Comp_nuin) indicated in FIGs. 7A and 7B, and it is assumed that the maximal, width .... 
...^ 

.5 ;Next, the component block bundles all of the tag collections included therein with, . 

reference to the comp6nent urnt hy a . single <TD> of a separate <TABLE> tag to be 
inserted into the same position as the upper ancestor <TABLE> for creation. 

With reference to FIG. 6 and FIG. 7B, the detailed algorithm of the component 
block setting step (405) will be described as follows. 

1 p '. First, the component tree (Component_tree) is inputted to check information on an ;. 

initial width of all component nodes, and then the following procedure is performed when 
the maximal width (&OlX_"WTOTH) is exceeded (601 - 604), It is determined whether 
there is a sibling node of the current component node (A), and then if there is the sibling 
node, a grouping procedure is performed for bundling similar sibling nodes within the 
15 range ofnot exceeding the maximal width ^ 

FIG.. 7B, the component of ®, @, (3) can be made to a group (©),(©),(©) or 

(®®) ? (©). ' 

In the following table blocking step (608), all tag collection belonging to each of 
the groups are expressed as one table block in a format such as 

2 0 "<T ABLFXTR>Component ® > SX/TRX/TABLE>". Or, if there is no ..sibling node, 

only the table blocking procedure of the component node is performed in the step 608. 

In the table block rearranging step of the step 609, the table block newly generated 
in an upper procedure is inserted into a previous sibling node of the <TABLE> node (B) as 
the grandparent node of the (A). , 
25. If the (A) is the last <TD> node of the (B) (61 0) and the (B) is the nested table 

(61 1), a next step is performed (612), and otherwise, a next node is visited to repetitively 
perform earlier procedure in a step 602. 
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The next step is performed when the ©,@,® of FIG. 7B are the CA), that is,' the 
component being currently visited. In ease the upper ancestor <TD> having the (B) as the 
child, that is, the (C) is the nested component, the step 609 is performed. In other words, 
the ®,@. of FIG. 7B and eachofthe(C) becomes © and ®". With reference to' the 
.5 . child node (701 of FIG.' 7B) including the (B) among the child nodes of the (C), all sibling 
. nodes at left arid right sides are bundled by each of the table blocks (702/703 ofFIG. 7B). . 
Again, tKe table block including the (C) is generated (614), and the' step. 609 is repetitively t . ■ 
■•-.performed...... • .:" : .. ■■ "V,-. -. .• 

The component is extracted as one expression trait through the table blocking, and 
i 0 ; the extracted component is defined as the component block. Each o f the component blocks 
has an arrangement sequence determined according to a position of the component on the : 
tree, and is repressed in a shape of a table block, up to down depending on the sequence. 

Referring continuously, to FIG. 8, the detailed algorithm of the component block : 
categorizing step 406 will be described. . . 
.15 The component block tree' is inputted to visit all component blocks while the 



content pattern of th6 component block is compared (801 — 803). At this time,' a ushble 
comparative variable is arranged jn the following Table 3. 

Tabled- . 



Variable 


Expected pattern 


Text_Length 


Similar repetition, liniited short length 


Image_Width 


Similar repetition, limited width 


Link_Number 


Almost all contents have link information. Comparing position o 
connected document, similarity of file name • 


Row_num 


Limiting to small number. Limiting to block arranged at upper stage ii 
web document 


Col_nurn 


Limiting to maximal or minimal value; Limiting to block arranged at lef 
or right side. 



2 0 Depending on whether or not a result value of the partem comparison exceeds a 

certain critical value, the index type (INDEX type) is determined (804, 805). The 
component block determined -as the index (INDEX) x-espectively sets a type value to an 
image index (INDEXJT) and a text index (INDEXJT) (806 - 808) depending on whether 
data type of the content thereof is the image or text 
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The block not being the index (INDEX) is categorized as the body (BODY), and is 
categorized as a voice body , (BOD Y_V) type, for converting into a voice supportable 
document , and a general . body (BOD Y_G) processed as other general content blocks 
according to a relative importance of the text to the content included £809 - 812). In case 
5 of not being the last block in the step 813, a procedure is performed starting from the step 
802 forthe next block. '// • ■., .• ": 

- The .aft^^tegorizattSn procedure be des 
chart showing a.total ^ oper^pni prbcedi^e of HG. 4'. 

/ Referring to the dravm categorized (407-409, 

L0 412), the steps:4lt, 413, 414 of PIG. 4 are performed or the ^ component block is well 
extracted (410) according to the type of each component block. This procedure is 
performed for all . component block (415), and each of the blocks are : suitably arranged in 
&e.last step 416 to generate a new HTML document (417). An .bperarioii procedure by the 
typeof^ 

15 If the type of the cornponeht block is the voice body (BODY_Y)(Type 

•BODYJV). v6 * ce document generating step (4i 1) is performed to generate the voice, 
supporting document This is performed in the voice markup generator 208 module of FIG: 
2, and all text portions can be added as the <^ompt^^iie as in a sample code of the 
following Table 4 in the block to generate a simple VoiceXML docurnent. The generated 

2 0 document is stored as a separate file and is connected with a link in an original HTML. 

Table 4 

<?xml version=" L 0 n ?> 
<vxml version-" 1 ,0"> 
25 <form> 

<block> 

<prompt> 

(Adding text information extracted from Blcok categorized as BODYJV, 

to value) 

3 0 </prompt> • - 

<disconnect/> 
</b!ock> 
</foim> 
</vxml> 

35 
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Herein, if Hie type of the component block is the general body (BODY_G)(Type 
=.BODY_G), it is extracted well for rearrangement due to the general content element. 

If the type;of the component block, itsr.the image index (INDEX_I)(Type '— 
INDEX_I), the image index (image Index) expressed in the Java Script through the image : 
5 index generating step (413) is generated. As in an example of a sample code of the 
following Table 5, a simple script file is automatically; generated;; and the image file is 
mapped for its embbdinlent 

Table5 

10 // javasciipt filled into HEAD 

<SCRIPT IANGUAGE= , 7ayaScript"> 

■■<!- 

• image!- new ImageO; 

irriage.l .src -'image 1. gif% : 
15 imaged new ImageO; 

image2.src 1, image2.gif , ; ' 

irciage3=? new ImageO; 

image3 ,src = "iinage3.gif 1 ; 

image4= new ImageO; 
20 image4.src « "image4.gif f ; 

linlcs = new Array; 

lMcs[0]- n IJNK#l"; 

lhilcs[i] = M IINK#2"; 

h^[2]^ r XINK#3"; 

2 5 lhilcs[3] = "LINK#4"; 

function imgchange(){ 

varimageNum= document form.sellmage. selectedlndex + 1; 
fiiame = eval("image ir + imageNum '+ ".src"); 
document.img.src = fiiame; 
30 } ... 

function go(){ 

location = linlcs[documentfonn,selImage.selectedIndex]; 

>..-.. " 

function showlink(){ 

3 5 Window.status = links[document.fonn.seIlinage.selectedIndex]; 

} 
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,//-> . 

<SdRIPT> . 

// form tag filled into. BODY : . 
5 ^FORMname^^fonn^ : 

<SELECT NAME= ,T selImage ,? size=l onChange^lmgchangeO; ,, > 
<OPTI(SSf>Index 1 \ 
<OPTIOIs>lhdex 2 
<OPTION>ihdex3'' 
10 . <OPTiON>Index 4 ; 

.</SELfiCT>" 

■ </FOBM> : 
<a'hrei^^ 

15 <»4GSRe-*imagel. gif f 

. Herein; the type of the component block is the text index (INDEX jl7(Type = 
: INDEX JT), the index information is expressed as the text and is re-expressed using the 
<select> tag as shown in the following Table 6 through the text index generating step 414. 
20 The image index generating step (413) and the text index generating step (414). are 
. performed in the index generator 207 module of FIG. 2, and the index information can be 
extracted in a general manner. 
Table 6 

//javascript filled into HEAD 
25 <script language="JavaScript M > 

<!~ ■ 

function change(form) { 
var list—fonii.selectedlhdex; 
location type^form.options[list] .value; 

30 

// location type is selected among the followings 
// - self.location.href : linking to frame belonging to oneself 
// - top.locatiori:href : all screen is.changed irrespective of frame 
//- parent .location.href : parent frame including oneself is changed 
15 11 ' parent.framename.location.hi-ef : linking to child frame having selected name 

among parent frames 
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fomi-seleefedlndex =0; 

. } 

//-> : 

</script> 

* 5 ; 

// form tag filled.into BODY 

<form name^'- foixoname" methods , -get , > . 

<seIectnanie= n foim" 6nch^ge^'change(doci^ 
<optibn selected>index Iist</option> . 
10 ;<optionvalu^ 

<optibn value;='1irik #2 h >index 2</pption> 
^ptfoiivalue="liiik#3 ">index 3</option> 
</select> 
. </fonn> 

*s '.' " 

After each component block is expressed in an appropriate method according to 
the content characteristic as-described above, the content object is an*anged and generated 
through the new HTML constructing and generating step 416 performed in the HTML 
generator 209 of FIG. 2. The sample code of the following Table 7 provides a tag 

2 0 construction of a total HTML and a shitple arranging method of each content object 

Table 7 

<HTML> 

<HEAD> 

<TITLE></TlTLE> 

25 <SCRIPT> -> enclosing script file automatically generated by Java Script 

Generator module. 

This is added in case Image Index is generated. 
</SCRIPT> * 
</HEAD> .; 

3 0 <BODY> --> Attaching Component Block categorized into INDEXJT or 

BODY_G into BODY tag. 
<SELECT> 

. <OPTION> --> generating select list form as many as Text Index and arranging 
respective values with Option tag. 
3.5 </SELECT> 

<TABLE> ,\" 
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<TR> 

<TD> --> arranging including each of Component Blocks categorized into 
BODYJ3 as value of TABLE TD. At this, time, width of total table newly generated is 
determined according to display performance information represented in client profile. - 

<IMG ^ src^^eaker.gir/^A . href = r, ***.xmT> listening to content (Title) . 
</A> . — > connect BODY_y block converted into VoiceXML; 

</TD> 

<TR>, 

</TABLE> 

<BODY> 

</HTML> 

The inventive content converting system as described above can be put on three 
layers of a web server, a client, and a proxy, and respectively has. merits and- demerits 
depending on its environment: Further, the extraction algorithrn of the component and the 
cor^6rient block can be embodied in various, methods, arid further, an index generating 
and voice document generating method is exemplified as one of several embodying 
methods. 

FIGs. 9A and ?B are exemplary view illustrating a converting result of the web 
content according to a preferred embodiment of the present invention. 

FIG. 9 A mustrates a resultant page of the web document converted through the 
rearrangement of the content unit object and the index extraction, and FIG. 9B illustrates a 
resultant page representing in case the voice supporting markup creating function is added 
to the resultant page of FIG. 9 A. 

Industrial Applicability 

As described above, the present invention provides a new technique and system so 
that the web document prepared to be suitable for the display performance of the existing 
general desktop personal computer, is converted to be effectively expressed even on the 
small display in case the user of the small display device intends to use a web service by 
connecting a wireless internet. According to the present invention, the web document is set 
to the content unit piece by analyzing the structural tag infoimation, and is bundled into a 
similar content unit group and then categorized into the index or body content on basis of 
the content information for rearrangement such that- a function of browsing- with a 
convenient interface without left and right scrolling for a total web page is provided. 
Further, the extraction and the generation of the. index and the converting of the voice 
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supporting web document are also provided together to provide various reconstructions of 
the web document and .an expression effect considering the characteristic, of the small 
device. Further, ail effect can be also obtained' for maintaining the content of the : original 
document tp the maximum for clari^^g a meaiiing delivery. ; ,. 
5 . It will be apparent to those skilled in the art that various modifications and 

variations can be prepared! in the present invention. Thus, it is intended that the present 
invention covere ihe .modifications ;and v they come 

within the scope of the appended, claims &d their iequivalents. 
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Claims 

• 1. A web content converting system for converting a large display 

screen web document into a small display screen web document, the system 
comprising: ' : 

' a preprocessor for standardizing "a non-standard web document having an 
erroneouis tag to output the standardized web document in a data format suitable for 
. analysis; ' 

. aclientproffle^^ . 
: a slructuie analyzer for receiving the web document standardized in the 
preprocessor to : set the web docunient to a content ^ unit piece (component) 
, according to a document analysis algorithm; 

an ; image . converter ior extracting iiiformation on ; m ; . image 
.fencodiiig/decoding procedure and an image size included in the web document; 

ia component -block extractor for grouping the set content unit piece : 
(component) to similar groups within a range not; exceeding a maximal width by 
using an attribution value of the content unit piece (component) and client 
performance information; 

a component block categorize* for categorizing each of component blocks . 
generated by the component block extractor into index and body- content portions in 
accordance with a content characteristic; 

an index generator for extracting information on image or text index firom 
the component block categorized into the index portion, and generating a script file 
and an additional tag collection for expressing the extracted information; 

a voice markup generator for converting a text-centered body content block 
into a voice markup language to perform a voice supporting function; and 

a HyperText Markup Language (HTML) generator for rearranging and 
reconstructing the generated content object elements according to a document 
* piattem to generate the small display screen web document 

2. Hie web content converting system of claim 1, wherein the web 

content converting system is installed, at any one of three layers of a web server, a. 
client and a proxy. 
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3. A web content converting method for converting a large display 

screen web document into a small, display screen web document, the method 
comprising; 

a preprossesing step for standardising a non-standard web document 
including an erroneous tag to output the standardized w^eb document in a data 
format suitable for analysis; 
- a web document analyzing step for receiving the ' standardized web 

document and analyzing a tag according to a document analysis a set 
the web document to a content unit piece (component); 

a compment block ^ .setting step for grouping the set content unit piece 
(component) to similar groups within a range not exceeding a maxima] width by; 
using an attribution . value of title ^ content umt : piece . (component) and client 
performance information;. • 

, a component block categorizing step for'categorizing each of component 
blocks generated by the component block extractor into index and body content 
portions in Accordance with a content characteristic; 

an index generating step for extracting information bn image or text index 
fiom the component block categorized into the index portion, and generating: a 
script file and an additional tag collection for expressing the extracted information; ' 

a voice markup generating step for converting a text-centered body content 
block into a voice markup language to perform a voice supporting function; and 

a HyperText Maiiaip. Language (HTML) generating step for rearranging 
and. reconstructing the generated content object elements according to a document 
pattern to generate the small display screen web document. 

4. The web content converting method of claim 3, wherein in the 
web document analysing step, a tag .such as <TABLE>, <TR> S <TT», <IMG>, etc. 
is mainly analyzed, and a specific <TD> tag is defined as a component to be used 
as a minimal unit for the content unit analysis. 

5. The web content converting method of claim 3 9 wherein in the 
component block setting step, a component tree is inputted to check initial width 
information f or a 11 component n odes, and i t i s checked whether ornot.a s ibling 
node of a current component node exists, and if existing, similar sibling nodes are 
bundled and grouped within the range not . exceeding the maximal width 
(MAX_WmTH). 
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6. The web' content converting method of! claim 3, wherein the 

component block categorizing step comprises the steps of: 

receiving a component* block tree to visif all component blocks while to 
compare a content patteni of the component block; 

. determining index typeif a resiiltmt value of the 'pattern comparison, 
exceeds a certain critical yaliie; 

*•" ■.'.■ '! setting a type of the mdex^etermined -block to each of an image index 
(JNDEXji^ oT £ te:tf indbt a data type of the/ 

content is aji image or atext; and. 

categorizing the block ^ 
the voice body (BODY^V) forperfbiming.tibie cbny^ting into the voice su^ 
document arid the general body jQBODY_,G) processed is other general content 
blocks. 
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